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1 Introduction 


Proponents of generalized phrase structure grammar (GPSG) cite its weak 
context-free generative power as proof of the computational tractability of 
GPSG-Recognition. Since context-free languages (CFLs) can be parsed in 
time proportional to the cube of the sentence length, and GPSGs only gen¬ 
erate CFLs, it seems plausible that GPSGs can also be parsed in cubic time. 
Gazdar (1981:155) argues that this would in turn provide “the beginnings 
of an explanation for the obvious, but largely ignored, fact that humans 
process the utterances they hear very rapidly.” 1 

This widely-assumed GPSG “efficient payability” result is misleading: 
parsing the sentences of an arbitrary GPSG is likely to be intractable, be¬ 
cause a reduction from 3SAT proves that the universal recognition problem 
(RP) for the GPSGs of Gazdar (1981) is NP-hard. This complexity classi¬ 
fication means that the fastest recognition algorithm for GPSGs could take 
at least exponential time. Therefore nothing in the GPSG formal framework 
guarantees efficient parsability, contrary to Gazdar’s argument from weak 
context-free generative power. Crucially, the time to parse a sentence of a 
CFL can be the product of sentence length cubed and context-free gram¬ 
mar size squared, and the GPSG grammar can result in an exponentially 
large set of derived context-free rules. A central object in GPSG theory, the 
metarule, inherently results in an intractable parsing problem, even when 
severely constrained. Section 2.3 below contains a formal proof. 

The apparent paradox between the efficiency of context-free parsing and 
the intractability of GPSG parsing is resolved below. I also discuss how the 
recognition problem is posed and the implications of this result for linguistics 
and natural language parsing. 


2 Complexity of GPSG-Recognition 

This section begins by formally specifying the class of generalized phrase 
structure grammars described by Gazdar (1981). 2 After providing some 

1 Joshi (1985:226) and Peterson (1985:315) make similar assumptions about the con¬ 
nection between weak context-free generative power and efficient processability. 

2 GPSG theory has changed considerably since Gazdar (1981). Gazdar, Klein, Pullum 
and Sag (1985), henceforth GKPS, contains a detailed and precise formal exposition of 
current GPSG theory. Ristad (1986a) analyzes the computational complexity of that 
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relevant technical background, I prove that the problem of determining for 
an arbitrary GPSG G and input string w whether w is in the language 
L(G ) generated by G is as hard as any nondeterministic polynomial time 
computation (NP-hard), and hence likely to be intractable. 


2.1 Formal Specification of GPSG 

The GPSGs of Gazdar (1981) contain sets of nonterminal symbols Vn, ter¬ 
minal symbols Vy, basic rules R, and metarules M. Basic rules are the 
set of rules required by a context-free grammar for English not handling 
unbounded dependencies. They are interpreted as node admissibility condi¬ 
tions rather than conventional context-free rewrite rules. A example basic 
rule is la, which is equivalent to the rewrite rule 16. 

а. [ S NPVP] 

б. [S-+NPVP] 


Metarules are a grammar for generating a grammar. They typically express 
many of the linguistic relationships expressed by transformations in transfor¬ 
mational grammar. Unlike transformations, whose domain can be an entire 
tree, the domain of metarules is effectively restricted to trees of depth one. 
Formally, they are functions from basic rules to sets of context-free rules 
with fixed input and output patterns that contain variables and constants. 
If a context-free rule matches the input pattern under some specialization of 
the metarule’s variables, then the metarule generates a context-free rule cor¬ 
responding to the metarule’s output pattern under the same specialization 
of the metarule’s variables. Metarule 2 performs Subject-Auxiliary inversion 
in Gazdar (1981): 


[VP -*VX] ==► [Q-> V NPX] 

1 [ +fin ] 1 [ +fin 1 J 

L+auxj [-fauxj 


( 2 ) 


Metarule 2 states that for every rule which expands a finite VP and intro¬ 
duces a tensed auxiliary verb (e.g. the rule that generates structure 3), there 

theory and proves that the universal RP for the GPSGs of GKPS can take more than 
exponential time, that is, time proportional to for some constant c, polynomial /(n), 
and input string and grammar size n. Section 3 below explores the practical implications 
of exponential time results. 
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is also a rule which expands the sentential category Q as that tensed auxil¬ 
iary verb, followed by a NP and whatever else followed the tensed auxiliary 
in the original rule (e.g. the rule that generates 4). 


[ VP [ v is ]Up Stupid ]] 
r+#■! r 1 

l+mxj L+»»*J 

(3) 

[q[ v is ][tfp Kim ][^ P stupid ]] 

r+*M 
l4»ox J 

(4) 


The set of derived rules is closed under metarule application. That is, the 
complete set of derived rules in a GPSG consists of the basic rules plus 
the maximal rule set that can be arrived at by repeatedly applying each 
metarule to each derived rule. 

Unconstrained metarule application may generate infinite sets of rules 
and describe arbitrary languages, including recursively enumerable ones. To 
preserve both the weak context-free generative power of a GPSG grammar 
and the supposedly attendant computational benefits, some formal con¬ 
straints on metarules have been proposed in the GPSG literature. One 
proposal is to constrain variables in the metarule pattern to be “abbrevia- 
tory variables,” i.e. variables that can only stand for strings in a finite and 
extrinsically determined range. Formally, each metarule variable may only 
range over a finite subset of (Vn UVt)*- While this constraint can affect the 
extensional language of the grammar in linguistically unmotivated and arbi¬ 
trary ways, I adopt a stronger version of this constraint for the purposes of 
examining its computational implications. In the proof below, all metarule 
variables either are constants (e.g. 0,1) or stand for a single symbol and can 
only have two possible values (e.g. x can only stand for the symbols 0,1). 
See Shieber et.al. (1983) for a discussion of Gazdar’s 1982 ‘abbreviatory 
variables’ proposal and some other proposals for restricting metarules. 

I further restrict the derivational power of metarules as follows. Metarules 
are functions from context-free rules to context-free rules (not from rules to 
sets of rules) that may only operate once in the derivation of a given rule. 
That is, a metarule pattern may match a rule in only one way, and metarules 
may not operate recursively on their own output. In addition, no unrecov¬ 
erable deletion can occur in a derivation and no two metarules or basic rules 
may be identical either in pattern or function. 
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2.2 The Classifications of Complexity Theory 

Mathematical complexity theory measures the intrinsic lower-bound diffi¬ 
culty of obtaining the solution to a problem no matter how the solution 
is obtained. Complexity theory studies the structure of problems: it clas¬ 
sifies problems according to the amount of computational resources (e.g. 
time, space, electricity) needed to solve them on some abstract machine 
model (e.g. a deterministic Turing machine). Complexity classifications 
are invariant across a wide range of primitive machine models, all choices 
of representation, algorithm, and actual implementation, and even the re¬ 
source measure itself. The robustness of these classifications is especially 
appropriate for cognitive science: while we do know what abstract problems 
the brain solves, we don’t know much about the representations, algorithms, 
or hardware involved. 

V is the natural and important class of problems solvable in determinis¬ 
tic polynomial time — problems with efficient solutions. AfV is the class of 
all problems solvable in A/’ondeterministic Polynomial time. Informally, a 
problem is in AfP if we can guess an answer to the problem and then verify 
its correctness in polynomial time. For example, the problem of deciding 
whether a whole number i is composite is in AfV because it can be solved 
by guessing a pair of potential divisors, and then quickly checking if their 
product equals i. A problem T is NP-hard if it is at least as hard compu¬ 
tationally as any problem in the class AfV : if we had a subroutine that 
solved T in polynomial time, then we could write a program to solve any 
problem in AfV in polynomial time on a deterministic Turing machine. Note 
that T need not be in AfV to be NP-hard. A problem is NP-complete if it is 
both in AfV and NP-hard. NP-hard problems can be solved only by meth¬ 
ods too slow for even the fastest computers. Since it is widely believed — 
though not proved — that no faster methods of solution can ever be found 
for these problems, NP-complete problems are considered computationally 
infeasible. A famous NP-complete problem is the traveling salesman prob¬ 
lem, that is, to find the shortest route for a traveling salesman who must 
visit a number of cities and return to the city he started at. Lewis and Pa- 
padimitriou (1978) informally discuss these problems and other complexity 
issues. Barton, Berwick, and Ristad (1986, forthcoming) further explores 
the relationship between computational complexity and natural language. 

Complexity classifications are established with the proof technique of re¬ 
duction. A reduction converts instances of a problem T of known complexity 


4 




into instances of a problem S whose complexity we wish to determine. The 
reduction operates in polynomial time. Therefore, if we had a polynomial 
time algorithm for solving 5, then we could also solve T in polynomial time, 
simply by converting instances of T into S. (This follows because the compo¬ 
sition of two polynomial time functions is also polynomial time.) Formally, if 
we choose T to be NP-complete, then the polynomial time reduction shows 
that S is at least as hard as T, or NP-hard. If we were also to prove that S 
was in Af~P , then S would be NP-complete. 

In this case, the known NP-complete problem T is 3SAT, and the prob¬ 
lem S of unknown complexity is GPSG-Recognition, Therefore, the proof 
will reduce instances of 3SAT (a 3-CNF Boolean formula F) into instances of 
GPSG-Recognition (a GPSG G and an input string a:). The 3-Satisfiability 
problem (3SAT) is to determine, given a boolean expression in 3-CNF, 
whether the formula is satisfiable. 3SAT is NP-complete. An example of a 
satisfiable 3-CNF boolean formula with five clauses is: 

(a V b V c) A (a V d V e) A (e V d V c) A (b V c V d) A (a V d V e) 

A Boolean expression is an expression composed of variables (e.g. x), paren¬ 
theses, and the logical operators V (OR), A (AND), and negation. Negation 
is represented as a horizontal bar over the negated expression (e.g. x is the 
negation of the variable *). A literal is a variable or the negation of a vari¬ 
able. Variables may have the values 0 (false) and 1 (true), as do expressions. 
An expression is satisfiable if there is some assignment of 0’s and l’s to the 
variables that gives the expression the value 1. 

A Boolean expression is in conjunctive normal form (CNF) if it is of the 
form E\ A Ei A • • • A Ek and each clause E{ is of the form an V an V • • • V a, mi , 
where each aij is a literal — either a variable x or a negated variable x. An 
expression is in 3-CNF i£ each clause in the CNF expression contains exactly 
three distinct literals. 

2.3 Reduction from 3SAT to GPSG-Recognition 

Recall that a reduction is an algorithm for converting instances of one prob¬ 
lem into instances of another problem. In the reduction below, it is im¬ 
portant to distinguish the process by which the metarules are constructed 
(the reduction) from the process by which metarules are applied in GPSG 
to generate a set of context-free rules. In particular, the reduction does not 
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generate all possible truth assigments for the variables in F; this would of 
course require exponential time, and invalidate the reduction. The work of 
generating all possible truth assignments is done by the metarule applica¬ 
tion process, not by metarule construction. For example, in part 2.c of the 
following proof, the reduction tests Wi and q] for equality in order to con¬ 
struct the metarules which instantiate negated variables. The reduction will 
require 0(m 3 logm) time to construct the \w\ metarules. The constructed 
metarules, however, never test Wi and qj for equality in the metarule appli¬ 
cation process. 

Theorem 1 GPSG-Recognition is NP-hard 


Proof. The proof will reduce 3-SAT to GPSG-Recognition in polynomial 
time. Assume as input a 3-CNF formula F of length m using the n variables 
qx, <? 2 i • • • ,3V Let w be the string of formula literals in F\ in general, Wi will 
denote the i th symbol in the string w. 

In the following reduction, S will be the distinguished start nonterminal 
of the constructed GPSG; 0 and 1 will be special metarule constants ; a* and 
bj will be metarule variables that range over the metarule constants 0,1; and 
A, B, #, and J will be special grammar symbols. I will use 0* to denote the 
length i string of all 0’s, where 0 is a metarule constant. Thus, 0 5 denotes 
00000. 

The reduction constructs a GPSG grammar G such that the special 
symbol # is an element of L(G) iff F is satisfiable. The idea is that the 
constructed GPSG will guess an assignment of truth values for the formula 
variables, and then determine if the guessed assignment satisfies the formula 
F. 

The constructed metarules will use the right-hand side of the context- 
free rules to record guesses and as a scratchpad during the evaluation of the 
guess. The string x to the left of the “J” symbol represents the formula F, 
where Xi is the literal in the i t h position of debracketing of F , w. The string 
y to the right of the “J” symbol encodes an assignment of truth values to 
variables, where the j th position yj stores the truth value assigned to the 
formula variable qj. 

To “guess” an assignment, the metarules will generate all possible truth 
assignments to the n variables, exactly as a deterministic Turing machine 
might if it were simulating a nondeterministic Turing machine. To verify 
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the guess, some metarules substitute the guessed variable values for the 
formula literals, and other metarules “evaluate” the instantiated formula. 
For the 3-CNF formula to be true, every clause must be true, and for a 
clause to be true, at least one of the three variables in the clause must be 
true. The following metarules exploit the regularity of 3-CNF formulas in a 
very obvious manner. 

G contains 

1. n + 1 basic rules: 

the i th rule: [A —* t l*0 n-i ] where 0 < i < n 

The basic rules, in conjunction with some of the metarules, will gen¬ 
erate all possible assignments to the formula variables. 

2. the metarules 

(a) |n(n—1) metarules, which generate all possible truth assignments 
when they are applied to the basic rules. (n is the number of 
distinct variables in the formula F.) For all i and for all j, 1 < 
i < j < n, construct the metarule: 

[A —► Ol“l J a\ ... aj... aj ... On] => 

[A —► 0 I “I t ai... a f _xaja i+ i ... a 3 -_ia < a. f+ i... a n ] 

These metarules exchange any two symbols (in positions i and 
j) in a string of length n. Therefore, if one of these metarules 
applies to a rule, it will exchange the truth values assigned by 
the rule to the variables g t - and qj. Since any subset of these 
metarules can apply one by one to the basic rules before the next 
metarule (immediately below) shuts off the process, any posssible 
truth-assignment to the variables can be encoded in the substring 
to the right of the “f” symbol. (See the formal proof of lemma 1 
below.) 

(b) One metarule, to stop the generation of truth assignments: 

[A —> ()l“l J ax ... a n ] => [B -» 0^ i a t ... a n ] 

This metarule prevents any of the metarules described above 
from changing a guess while the following metarules determine if 
the guess satisfies F. 
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(c) |«j| metarules are needed to instantiate the variable truth assign¬ 
ments generated by the preceeding metarules into the formula lit¬ 
erals. Note that the basic rules assign formula literals the truth 
value 0 by default. Consequently, we only need to instantiate 
literals with the truth value 1. For each literal in F, there will be 
exactly one metarule: i will index the i th literal in w, and j will 
index the corresponding formula variable, whose value is encoded 
in the string to the left of the J symbol. A negative literal (e.g. 
a) will be true when its variable is false, while a positive literal 
(e.g. a) will be true when its variable is true. Formally, include 
the following metarules for all i, 1 < i < |u’[: 

• If Wi = qj for some j, then construct this metarule to watch 
for qj being true: 

[B —> ax...a i _ 1 0a i+ i.. .a|„| J Zq ...b n ] =4- 

[5 ► Oi... a,-_ila< + i... t b\,.. bj — llb^+i ... b n ] 

• Otherwise, W{ = qj for some j, and we construct this metarule 
instead to watch for qj being false: 

[B oi .. .ai_iOai + i . ..a|„| X &i ...bj_iOb j+1 ... 6„] =4 
[B — > oi .. .Oi_ilof + i .. .a|„| f &i ... bj_iObj + i . ..b n \ 

Note that these metarules instantiate the negation of formula 
variable qj in the i th position of w. 

(d) Include ||rw| metarules to verify that the guessed assigment satis¬ 
fies F. (Recall that the formula F is in 3-CNF, and therefore |u>| 
will be a multiple of 3.) According to the definition of 3SAT, a 3- 
CNF formula is true if all its clauses are true, and a clause is true 
if any of its three literals is true. Accordingly, these metarules 
will erase a 3-CNF clause iff at least one of its three literals is 
true. Therefore, if the metarules can erase the entire string to 
the left of the “J” symbol, then the formula is satisfiable. There 
are seven such metarules for each k, 0 < k < ^ — 1: 


[B —► OOlai .. . azk t h ... fc n ] 
[B —► OlOax ... a sk I &i • • • b n ] 
[B -*■ lOOai... a 3 * t ... fc»] 
[B -+ Ollai ...a 3 i Xh •••&„] 
[B -> lOlai ...a 3 k t&i ...bn] 
[B -♦ llOaj . ..o 3 fc Xh ...b n ] 
[B —* Ilia! ...a 3k Xh ...b n ] 


=> [J3 -> a x ... a 3 * t bi ... b n \ 

==> [B -* a x ... a 3k X h ... b n ] 
=4* [B -f ai ... a 3k t bi ... b n \ 
=> [B -*• aj... a 3J b X h ... b n ] 
=> [B -> ai... a 3k J bj ... b n ] 
=> [B -> ai • • • a 3k { b x ... b n ] 
=> [B —» ... a 3k X h ... b n ] 
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(e) Finally, one metarule, to erase the assignment string and generate 
the accepting production: 

[B —► ... a n ] => [5 -+ #] 

As we mentioned above, clauses are erased iff they are true, and 
this metarule can only apply if all 3-CNF clauses in F have been 
erased. Thus, this metarule is used iff all the clauses are true, 
and the formula is satisfiable. 

G contains the production [5 —► #] iff F is satisfiable, so # 6 L(G ) iff F is 
satisfiable. 

The result of applying any metarule (aside from those described in the 
first construction) is to change a basic rule so that the metarule cannot apply 
to it again. Note that the time required for the reduction is essentially the 
number of symbols needed to write the grammar down. The reduction can 
be performed in 0(m 3 log m) time because the longest metarule is of length 
O(mlogm), and there are 0(m 2 ) metarules. Q 

As promised earlier, I now prove that all possible truth assignments can 
be generated by the first metarule schema above, subject to the restriction 
that a metarule may only operate once in the derivation of a given rule. 
This is equivalent to proving that we can generate all binary numbers from 
0 to 2" — 1 inclusive, using only n + 1 binary numbers and the metarules 
in 5. 


Lemma 1 The |n(n — 1) metarules described by the schema 5, which can 
exchange any two bit positions i and j in a binary number of length n, 

Vi, j, 1 < i < j < n, 

(5) 

[dj . . . (Zj . . . dj , . . fln] ’ — V \&l • . . di — ldjdt + l • • ■ Uj_ \didj _|_1 . . . U n ] 

can generate all binary numbers from 0 to 2 n — 1 using only the n -f 1 
binary numbers 1*0""% where i ranges from 0 to n (inclusive), even though 
no metarule may apply twice in the derivation of any given binary number. 


Proof. Let a(k) be a binary number with k l's in its binary representation, 
0 < a < 2 n , and let (3(k ) = be the k th binary number. Then the 

following algorithm, expressed in a generic programming language, derives 
a(k ) from 0(k) using the metarules: 
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PROCEDURE = DERIVE(a,0) 

1: for i = 1 to 

2: if a; ^ 0i then do 

3: for j = n to 0 step — 1 

4: if ctj ^ (3j then goto 6 

5: next j 

6: Let 0 be the result of applying the metarule 

\ctl • . . • . . CLj • • • 0>n\ ' , r 

[®1 • • • lGjGt-f 1 • • • • • ■ ^n] 

to 0, switching bit positions i and j in the number 0 

7: next i 

No metarule can be applied twice because i and j are different every time 
a metarule is applied (in line 6). In any derivation, at most metarules 
are applied (see line 1; in fact, exactly MIN(k,n — k) metarules are applied 
in any derivation). |~~1 


Example derivation. Let a(5) = 0010011110, a binary number with 5 l's 
in its binary representation. Also let 0{b) = 1111100000, the 5 th binary 
number. Then the following table illustrates how the algorithm of lemma 1 
derives a(5) from 0(b). 


0 

limooono 

0111100010 

0011100110 

ooioioino 


metarule used 

[aia2a3ffl4U5U8®7fl8 a 9®lo] [tt9a2®3 a 4 a 5 a 6 <I 7tt8 a l< :l lo] 
[ax02030405030708agOio] = 3 ' [0108030405030702090x0] 
[010203040503070809010] => [010203070503040809010] 
[010203040503070809010] ==> [010203040305070809010] 


next 0 

0111100010 

0011100110 

0010101110 

0010011110 


2.4 Example reduction from a 3SAT instance 

This section uses an example reduction to provide a concrete illustration of 
the preceeding proof. Suppose the input 3SAT instance F is: 

F = (a V b V c) A (a V b V c) 

Then the string of formula literals w is abcabc, and the symbols qi,q 2 , and 
<73 refer to the formula variables a, b, and c, respectively. 
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The reduction algorithm described above constructs the following GPSG 
in polynomial time in the length of F. Note that the metarule application 
process, and not the reduction algorithm, generates all possible truth as¬ 
signments to IP’s variables. 

1. 4 basic rules, and 

[A —► 000000 % 000] [A -» 000000 1:100] 

[A — 000000 t HO] [^4 — 000000 t 111] 


2. 25 metarules 

(a) 3 metarules to generate assignments 

[A —► 000000 1 010203 ] =>■ [A —> 000000 1 avoids] 

[A —* 000000 t aia 2 o 3 ] => [A —► 000000 J 03020 !] 

[-4. —► 000000 1 01 ^ 2 ^ 3 ] ■ ■. v [A —► 000000 $ &xu 3 u 2 ] 

(b) one metarule to freeze assignment generation 

[A —» 000000 J a x a 2 a 3 ] => [B —> 000000 t axa 2 a 3 ] 

(c) 6 metarules to instantiate formula literals 

• 3 metarules for unnegated variables 

[B —* Oa 2 a3a 4 as<z 0 $ 1& 2 & 3 ] =>■ [B —► lo 2 a 3 a4a B a 6 J I& 2 ^> 3 ] 

[B -* 0 x ^0040500 } f>ii> 2 l] => [B —* ax 0 2 la 4 a 5 a 6 J &x 6 2 l] 

[B —* axa 2 a 3 a40a6 J feiI& 3 ] => [B —> axa 2 a 3 a4la 6 1 ?>il& 3 ] 

• 3 metarules for negated variables 

[B —> ai0a 3 a 4 a 5 a6 J bi0b s ] ==> [B —* aila^asae-t fci0Z> 3 ] 
[B —» aia 2 a 3 0a 5 a6 J 01> 2 f> 3 ] =>■ [B —> aia 2 a 3 lo 5 a 0 | 0Z> 2 i> 3 ] 

[B —► oia 2 a 3 a4a 5 0 % bxb 2 0] ==> [B —» oxa 2 a 3 a 4 a 5 l t fciBzO] 
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(d) 14 metarules to verify guessed assignments, and 

[B —► 001ai<i2 o 3 t bib 2 b 3 ] [B —* 0 , 10 , 20.3 t & 1 & 2 & 3 ] 

[B —» 010aia 2 a3 t bib 2 b 3 ] => [B —y aia 2 a s t 61 & 2 & 3 ] 

[B —y 100aia2a.3 t bib 2 b 3 ] = 4" [B —> Oia 2 O 3 t ^* 2 ^ 3 ] 

[B —y 011 oia 2 a 3 t 61 & 2 & 3 J => [B —► aia 2 a 3 i bib 2 b 3 ] 

[B —+ 101aia 2 a 3 t 6162 ^ 3 ] =>• [B —> aia 2 a 3 t & 1 & 2 & 3 ] 

[B —y 110a x a2a3 t 6162 M => [B —* 0^03 t bib 2 b 3 ] 

[B —y lllaiti2 a 3 t bib 2 bs] == ^ [B —* <210203 t £> 162 ^ 3 ] 

[B —► 001 t 6162 ^ 3 ] =>■ [B —+ JZ>i B 2 B 3 ] 

[B -» 010 t hhh] => [B -y thb 2 b 3 ] 

[B -> 100 t 616363 ] =* [B^thb 2 b3] 

[B —► Oil t 616263 ] => [B^t*>iM> 3 ] 

[B —y 101 t BX 62 B 3 ] =y- [B —y J&iB 2 B 3 ] 

[B — 110 t M 2 & 3 ] =4- [B —> tbib 2 b 3 ] 

[B -* 111 $ bib 2 b3] => [B -» ihb 2 b 3 ] 

(e) one metarule to create the accepting production 

[B -y taia 2 a 3 ] ==» [5 -+ #] 

If we apply the metarules to the basic rules constructed above, a proper 
subset of the resultant context-free rules is: 

[A —y 000000 1010] [A -y 000000 t001] [A —y 000000 t 101] [A -y 000000 t Oil] 

[B —y 000000 t 000] [B —y 000000 t 001] [B —♦ 000000 t 010] [B —♦ 000000 % 011] 

[B -+ 000000 1 100] [B —y 000000 t 101] [B — 000000 1 110] [B —+ 000000 t 111] 

[B —► 010101 J 000] [B —* 011100 { 001] [B —► 000111 t 010] [B —► 001110 J Oil] 

[B -y 110001 J 100] [B —+ 111000 t101] [B -+ 100011 J 110] [B —»101010 J 111] 

[B —y 1011 000] [B —y 100 J001] [B — 110 {Oil] [B —» 001 1 100] 

[B — 000 1 101] [B -+ 011 J 110] [B -» 010 t 111] 

[B — t 000 ] [B — t 001 ] [B — toil] [B — $ 100 ] 

[B - tllO] [B - till] 

[ 5 -#] 

The formula F is satisfiable, because the production [5 —► #] is generated 
by metarule application. 
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3 The EP Paradox Resolved 


At first glance, a proof that GPSG-Recognition is NP-hard appears to con¬ 
tradict Gazdar’s efficient payability argument noted above. GPSGs only 
generate CFLs and CFLs can be recognized in polynomial time (0(n 3 ) for 
a sentence of length n). Therefore it would seem that GPSGs can also be 
recognized in polynomial time, simply by converting the target GPSG into a 
weakly equivalent context-free grammar (CFG) and recognizing using that 
CFG. 

This argument is misleading because it ignores both the effect convert¬ 
ing the GPSG into a CFG has on grammar size, and the effect grammar 
size has on recognition speed. The crux of the matter is that even a highly 
constrained GPSG grammar can result in an exponentially larger derived 
context-free rule set. Informally, each metarule application can more than 
double the size of the GPSG grammar G. Since there are 0(\G |) metarules, 
the resulting derived grammar can be of size <3(|G| • 2^), that is, exponen¬ 
tially larger than the GPSG. 3 Standard context-free parsers like the Earley 
algorithm actually rim in time 0{\G\ 2 • n 3 ) where |G| is the CFG size and 
n the sentence input length, so the hypothetical GPSG grammar G will be 
recognized in time 

0((\G\ -2 l<?i ) 2 -n 3 ) 

Even if the GPSG grammar is held constant, the exponential increase in 
derived grammar size will result in an astronomical constant multiplicative 
factor, which will dominate the performance of the Earley algorithm for all 
expected inputs (that is, those of a million words or less), every time we use 
the derived grammar. Thus, in the worst case, if a GPSG with 10 symbols 
recognized a given sentence in .001 second, a grammar with 50 symbols 
would recognize the same sentence in 35.7 years, and a grammar with 100 
symbols could take at least 10 15 centuries. 4 (Gazdar’s (1981) toy grammar 

3 This mathematical analysis is vindicated in practice. Phillips and Thompson 
(1985:252) observe that in their parser based on the GPSGs of Gazdar (1982), “To expand 
the [GPSG] grammar completely . . . would be ridiculously wasteful of space and time. 
(The toy grammar of English we use with GPSGP [their parser], of 29 phrase-structure 
rules and four metarules, which expands to 85 rules, is equivalent to several tens of millions 
of context-free rules.)” Similarly, Shieber (1983:137) notes that typical post-Gazdar(l982) 
GPSG systems contain “literally trillions” of derived rules. Ristad (1986b:83) estimates 
that the GKPS grammar for English corresponds to at least 10 33 context-free productions. 

4 Evans (1985:237) experiences the real-world intractability of GPSG-Recognition first 
hand in his GPSG-based parser, and proposes to manage it by eliminating lexical am- 
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contained many hundreds of symbols, so a grammar size of 100 is somewhat 
small.) 

The resolution of the EP paradox may also be understood as a special 
case of the central distinction between •problem, complexity and algorithm 
complexity. As I mentioned earlier, the complexity of a problem is its inherent 
complexity , the computational cost of solving a problem, no matter which 
existing or undiscovered algorithm is used. Conversely, the complexity of 
an algorithm is the cost of a specific algorithm or procedure for solving a 
problem. Thus, the fact that GPSGs can succinctly encode some CFGs 
indicates straightforward use of standard CFG recognition algorithms will 
fail to be efficient for GPSGs, because a GPSG is weakly equivalent to a very 
large CFG, and CFG size affects recognition time; yet that fact in no way 
bears on the complexity of the GPSG recognition problem. The complexity 
result, on the other hand, firmly establishes that no known or yet to be 
invented algorithm for GPSG-Recognition will be efficient, unless V = A fV. 

Although known grammar conversion procedures increase both the gram¬ 
mar size and recognition time for the GPSG, the preceding discussion does 
not in principle preclude the possibility of “compiling” the GPSG into a 
“fast” grammar. 5 If the compiled grammar is truly fast and assigns the 
same structural descriptions as the uncompiled GPSG, and it is possible 
to compile the GPSG in practice, then the complexity of the universal RP 
would not accurately reflect the real cost of parsing. But until such a sug¬ 
gestion is forthcoming, I assume that it does not exist. 


4 Restricting Metarule Application 

Since the central problem is that GPSG metarules are capable of deriv¬ 
ing any finitely large set of rules, including exponentially large ones, we 
must further constrain metarule application if we wish to solve the GPSG- 

biguity and by keeping both grammar and input string size as small as possible: “The 
attempts to overcome the time and space problems have only been partially successful . . 
. . The only remedies seem to be, keep phrases as short as possible (for example, do not 
try to test large noun phrases inside complex sentences if it can be avoided — use proper 
nouns instead), make sure no words are duplicated in the lexicon, keep the number of ID 
rules currently loaded down where possible . . . .” 

“Barton (1985) shows how grammar expansion increases both the space and time costs 
of recognition, when compared to the cost of using the grammar directly. 
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Recognition problem in polynomial time and thereby obtain an efficient 
parsability result. 

A list of restrictions necessary to remove GPSG-Recognition from the 
class of NP-hard problems is: 6 

• strictly bounded “chaining” — only a constant number of metarules, 
fixed in advance for all GPSG grammars, can operate in the derivation 
of a given context-free rule. 

• each metarule may derive a ride set only polynomially bigger than its 
input rule set. 

• a metarule may only use “abbreviatory variables.” 

5 Defining the Recognition Problem 


Following complexity theory practice, I use the universal recognition prob¬ 
lem — given a grammar G and an input string x, is x £ L(G)'l — to 
formally analyze GPSG’s efficient parsability (EP) claims. Alternately, the 
recognition problem (RP) for a class of grammars may be defined as the 
fixed language RP (FLRP): given an input string x, is x £ L for some fixed 
language LI For the FLRP, it does not matter which grammar is chosen to 
generate L — typically, the fastest grammar is picked. 

It seems reasonably clear that the universal RP is of greater linguistic 
and engineering interest than the FLRP. The grammars licensed by linguistic 

6 In order to guarantee that these three restrictions are sufficient, GPSG must be com¬ 
pletely and exactly formally specified, in a manner which ensures that proliferation of 
categories will not make the recognition problem intractable. Another aspect of current 
GPSG formulations which make them NP-hard — and probably intractable — is the im¬ 
mediate dominance/linear precedence (ID/LP) formalism. See Barton (1985) for a proof. 
Note that the linguistically untenable restriction of prohibiting metarule variables of any 
kind is probably sufficient, when coupled with restrictions on ID/LP, to guarantee poly¬ 
nomial time recognition. Such a restriction would mean that a metarule, which may only 
“match” one basic rule, can only derive exactly one rule. The size of the derived context- 
free rule set would be the size of the basic rule set plus the number of metarules. This 
restriction is linguistically unmotivated because it fails to capture linguistically impor¬ 
tant generalizations. For example, any metarule applying to singular and plural sentences 
would have to be replicated at least twice: once to handle the singular case, and once to 
handle the plural case. 
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theory assign structural descriptions to utterances, which are interpreted se¬ 
mantically, translated into other human languages, and so on. The universal 
RP, unlike the FLRP, determines membership with respect to a grammar, 
and therefore more accurately models the parsing problem, which must use 
a grammar to assign structural descriptions. 

The universal RP also bears most directly on issues of natural language 
acquisition. The language learner evidently possesses a mechanism for se¬ 
lecting grammars from the class of learnable natural language grammars Cq 
on the basis of linguistic inputs. The more fundamental question for linguis¬ 
tic theory, then, is “what is the recognition complexity of the class >Cg?”. If 
this problem should prove computationally intractable, then the (potential) 
tractability of the problem for each language generated by a G in the class 
is only a partial answer to the linguistic questions raised. 

Finally, complexity considerations favor the universal RP. The goal of a 
complexity analysis is to characterize the amount of computational resources 
(e.g. time, space) needed to solve the problem in terms of all computation¬ 
ally relevant inputs. We know that both input string length and grammar 
size and structure affect the complexity of the RP. Hence, excluding either 
input from complexity consideration in order to argue that the RP for a fam¬ 
ily of grammars is tractable would not advance our scientific understanding. 7 

Linguistics and computer science are primarily interested in the universal 
RP because both disciplines are concerned with the formal power of a family 
of grammars. Barton, Berwick, and Ristad (1986, forthcoming) elaborates 
and extends these arguments. 

7 This “consider all relevant inputs” methodology is universally assumed in the formal 
language and computational complexity literature. For example, Hopcroft and Ullman 
(1979:139,346) define the context-free grammar RP as “Given a CFG G = ( V,T,P,S ) and 
a string * in T*, is x in X(G)?”, and the context-sensitive language RP as “Given a CSG G 
and a string w, is w in L(G)? n Garey and Johnson (1979) is a standard reference work in 
the field of computational complexity. All 10 automata and language recognition problems 
covered in the book (pp. 265-271) are universal, i.e. of the form “Given an instance of 
a machine/grammar and an input, does the machine/grammar accept its input?” The 
complexity of these RPs is always calculated in terms of grammar and input size. 
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6 The Complexity of Succinctness 


Section 3 resolved the EP paradox with an argument that superficially linked 
representational succinctness with computational complexity: exponential 
grammar expansion retards CFG recognition times when using standard al¬ 
gorithms. Therefore, it may be tempting to blame intractability on expres¬ 
sive economy. However, there is no causal relationship between succinctness 
and intractability — simply because the two notions are mathematically 
distinct. 

Complexity results characterize the amount of resources needed to solve 
instances of a problem, while succinctness results measure the space re¬ 
duction gained by one representation over another, equivalent, represen¬ 
tation. There is no casual connection between computational complexity 
and representational succinctness, either in practice or principle. In prac¬ 
tice, converting one grammar into a more succinct one can either increase 
or decrease the recognition cost. For example, converting an instance of 
context-free recognition (known to be efficient) into an instance of context- 
sensitive recognition (thought to be intractable) can significantly speed the 
RP if the conversion decreases the size of the CFG logarithmically or better. 
Even more strangely, increasing ambiguity in a CFG can speed recognition 
time if the grammar size is reduced sufficiently and slow it down otherwise 

— unambiguous CFGs can be recognized in linear time 0(|G| 2 • n), while 
ambiguous ones can require cubic time 0{\G\ 2 • n 3 ). Berwick and Weinberg 
(1982) discuss these issues in greater detail. 

In principle, tractable problems may involve succinct representations. 
For example, the iterating coordination schema (ICS) of GPSG is an un- 
beatably succinct encoding of an infinite set of context-free rules; from a 
computational complexity viewpoint, parsing with the ICS is utterly trivial 
using a slightly modified Earley algorithm. 8 Tractable problems may also 
include verbose representations: consider a random finite language, which 
may be recognized in essentially constant time on a typical computer, yet 
whose elements must be individually listed. Similarly, intractable problems 
can involve either succinct or nonsuccinct representations. As is well known, 
the Turing machine for an arbitrary recursively enumerable set may be ar- 

8 A more extreme example of the unrelatedness of succinctness and complexity is the 
absolute succinctness with which £*, the dense language of all strings over the alphabet 
£, may be represented — whether by a regular expression, CFG, or even Turing machine 

— yet members of S* may be recognized in constant time (i.e. always accept). 
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bitrarily big or small. 

The complexity result shows that GPSGs are not merely succinct encod¬ 
ings of some context-free grammars ; they are inherently complex grammars 
for some context-free languages. 


7 Conclusion 

A central goal of this paper has been to define the framework within which 
efficient parsability claims are best evaluated. Gazdar (1981) claims to offer 
the beginnings of an explanation for efficient parsability, yet the universal 
recognition problem for the GPSGs of Gazdar (1981) is provably NP-hard. 
Inasmuch as his argument overlooks a computational problem that is likely 
to be intractable, any support for GPSG on this basis is extremely weak. 
Metarules, even when severely constrained, are one source of GPSG’s com¬ 
plexity. 

The moral of this result is that as far as we know casual appeal to 
general mathematical results is not likely to rescue efficient parsability re¬ 
sults. Specific constraints on the particular representations postulated by 
linguistic theory are needed to explain efficient linguistic processing. This 
does not imply that GPSG theory is without merit: on the contrary, I have 
merely shown that its particular efficient parsability thesis cannot be main¬ 
tained. Generalized phrase structure grammar, lexical functional grammar, 
and transformational grammar are all probably intractable in an abstract 
mathematical sense, and each theory must search elsewhere for an explana¬ 
tion of efficient natural language parsing, if one is to be given at all. 9 
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